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Motivation 

* SEU analysis of a system is complex. 

* Currently, system SEU analysis is performed by 
component level partitioning and then: 

- Use the most dominant o SEU s for system error rate calculations, 
or 

- Sum component o SEU s for system error rate calculations. 

* In many cases, system error rates are overestimated. 

* Overestimation can cause overdesign: 

- Cost, schedule, functionality, and validation/verification can be 
compromised. 

* The scope of this presentation is to discuss the risks 
involved with our current method of SEU analysis for 
complex systems. 
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Scope of Systems Regarding This^f 
Presentation 

* Board or box level group 
of components: 

- FPGA, ASIC, ADC, 
microprocessor, 
microcontroller, memory, 
oscillator, voltage regulator, 
operational amplifier, etc..., 

• Network of components 
within a digital design 
implemented in an ASIC or 
FPGA 

- DFFs, combinatorial logic, 
clock managers (DCMs), 
look up tables (LUTs), etc..., 



Complex System SEU Evaluation 

* Challenges of evaluating complex systems: 

- Fitting the entire system in an accelerated beam, 

- Having the entire system accessible for testing, 

- Enhancing the visibility of SEU-induced system errors, 

- Controlling and monitoring the system during accelerated 
testing, and 

- Performing SEU data analysis. 

* Hence, SEU testing is generally performed using 
system partitions. 

- Partitioned component co-dependencies within the system 
should be determined and taken into account when performing 
SEU analysis. 

- Generally, there should not be just one SEU error rate for a 
system. Completely independent applications should have 
unique SEU error rates calculated 
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Component Level Error Rates versus 
Error Responses 



• SEU error rates: How often a component 
reaches an erroneous-state due to induced 
noise from ionization (SET or SEU). 

• SEU error response: What happens when a 
component incurs an SET or SEU. 

• Component Error rates are generally 
obtained from accelerated testing and o SEU 
extrapolation. 

• Other fault injection techniques exist, 
however, they are generally used for error- 
response studies. 


Several Factors That Are Generally Not 
Taken Into Account during Component 
Level SEU Testing 

• How often is the component used in the system? 



* Is the component masked? 


* Will the system be affected if the component incurs an 
SEU? 


- Can the SET dissipate prior to causing a system error? 

- Will the SET or SEU be captured by the system? 

- Is the SEU masked or is the system not communicating 
with the component while the SEU exists? 

• If several of the same components exist, are they all 
equally likely to cause a system upset? 

• Can the analysis be considered linear, i.e., can we sum the 
component SEU error rates? 
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When Dominant Component Error Rates ^ 
Can Be Used as the System Error Rate 

• The easiest system to evaluate is one where a 
dominant component error rate can be applied. 

- For example, a design implemented in a commercial 
SRAM-based FPGA. The configuration upset rates 
dominate all others. 

• However, this is not always straightforward: 

- If components are SEU tested separately, co- 
dependencies are not taken into account. This can 
change error rates significantly. 

- If components are co-dependent, it is important to 
either test as a system (sub-system) or evaluate how 
the co-dependencies can affect error rates. 

• For example, testing DFFs test structures versus DFFs in a 
system design. 


Characterizing SEUs: Radiation 
Testing and SEU Cross Sections 

SEU Cross Sections (cr se J characterize how many 
upsets will occur based on the number of 
ionizing particles the device is exposed to 

_ # errors 

® seu — r, 

jluence 

Terminology: 

• Flux: Particles/(scm 2 ) 

• Fluence: Particles/cm 2 

• a seu is calculated at 
several LET values 
(particle spectrum) 
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Characterizing SEUs: LET vs. SEU 
Cross Section Graph and How They 
Relate to Error Rates 


_ # errors dE/dt is calculated by integrating cr SEU over 
seu fluence the LET spectrum using a Weibull fit 



LET sat = Saturated LET 
LET th = Threshold LET 
a SAT = Saturated SEU 
Cross Section 

GEO Upset Rate: 

dE . C*«r M 
dt LET 025 

After Ed Petterson’s 
figure of merit 


C varies based on the orbit. For GEO, values between 200 and 400 are common. 11 


Example of Dominant o SEU 



• If the co-dependency between components is 
insignificant, then component error-rates can 
be summed; e.g, FPGA high-level internal 
structures: 

SEU Cross-Sections (<j SE u) = #upsets/particle/cm 2 


nfi) 


S 

' error 

Design o SEU 


OC P, 


Configuration 

Configuration CTsEU 


functionalLogic ^SEFI 


Functional logic 


Sequential and 
Combinatorial logic 
(CL) in data path 


SEFI &SEU 
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With hardened configuration and hardened 
global routes (e.g., Microsemi RTAX2000s) 


Global Routes 
and Hidden 
Logic 
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Taking into Account The Non- 
Linearity of Systems during the 
Extrapolation Process 


How do we extrapolate ct seu s to complex 
designs? 
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- Cutoff frequency (f c ) 

- Resistance (R) 
Capacitance (C) 


What Forces Non-Linear a SEU 

• System Block SEUs Extra P olation 

- How often is the component 
active? 

- Is the component masked? 

- Are global route SETs taken 
into account? 

• SETs 

- Dissipation during propagation 

- Elongation during propagation 

- Masking via logic components 

- Ringing/oscillation due to 
metastabiity (e.g., transistor 
push-pull during transient 
creation or clock tree SETs). 





Each 

capacitance 
has its own f„ 
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SET Characterization via Long 
Inverter Chains 



• Common method for testing SET behavior is to 
use a long chain of inverters. 

• Inverter SET cross sections are calculated by 
counting the number of SETs and dividing by 
the number of inverters. 


Problem: This method assumes all inverters 
have the same probability of upset as seen from 
the observation point (I/O). «o 

Block 

In addition, this Lo "9 Chain of ln > erters 

method assumes 
linear behavior. 

I/O block will filter 
small transient 




SEU Cross Sections and Error Rates - 
How We Apply Them to FPGA Designs 



• A goal of SEU testing is to provide error rate ( dE(fs)/dt ) 
predictions to critical missions. 

* a sEu s from SEU testing are used to calculate (dE(fs)/dt ) . 


• dE(fJ/dt for FPGA and ASIC devices are calculated using: 


System SEU bit Number of 
upset rate upset used flip-flops 


dE(fs) , 

dt 


DFFs 


dEjjs) * 

dt 


( WsedDFFs ■) 


• Assumes linearity - all DFFs are used every cycle and that 
they have the same probability of upset. 
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Background: Synchronous Design Data 
Path - Sample and Hold 



_lt_k_k_h zz Tdk= 7s 

Frequency 


• CL compute between clock edges. 

Designs are complex - We modularize for simplicity 
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Background: Synchronous Data Paths:^ftf 

StartPoint DFFs Td % nim c 



T dly 

* 


EndPoint DFFs 

T, 


T-1 


End Point 
M DFF 



T T+1 

Every DFF has a function that 
determines its state 

EndPo int(T) = / ( StartPo int s(T-\),CL) 


, (A XOR B | AND |C XOR D| 

• Datapath defined as StartPoint via CL to 
EndPoint. 

• CL and routes create delay (x d | y ) from 
StartPoints to EndPoints. 

* Every data path has a unique x d | y 

* T djy is calculated using Static Timing 
Analysis (STA) design tools. 

Modularization: Every DFF has a unique cone of logic 
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How can a DFF Contain an Incorrect 

State from a SEU? ® 

DFFs have various modes of Wrong function — 

reaching a bad state due to SEUs. _ _ ^ DFF State 

Attribute some modes to EndPoints 
and some to StartPoints. 


We make a clear distinction 
between DFF SEUs based on 
Clock state and Capture. 


End Point 


df r 


(A B ] 


'■/b 


DFF k Cone of Logic 


EndPoint DFF SEUs + StartPoint DFF SEUs + CL SETs 




EndPoint 

DFF 


DFF upsets that 
occur at the clock 

edge. 


DFF upsets that occur Single Event 
between clock edges and Transients 

are captured by captured by 

EndPoints. EndPoints. 
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Edge Triggered DFFs... Creating 
Deterministic Boundary Points 



D input must be settled by Output will only change at 

rising edge of clock. rising edge of clock. 


CLK CLKB 



Clock Low: Transparent Clock Low: Hold 


Clock High: Hold Clock High: Transparent 


CLK = clock CLKB = inverted clock 

In order to create precise boundary points of state 
capture, latches are NOT allowed in synchronous designs. 
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StartPoint and EndPoint DFF SEUs as a 
Function of Clock State (P(fs) DFFSEU ) 

Low: SEU generated in Slave 


CLK 

i 

D J 

CLKB 

K 

i\c 

cl!< b 1 
CLKBpJ 

ILK y 

<r 

cZk _ 

CLK 1 

CLKB j 


High— ^Low: Slave Captures 
its SET 

p CLK 

Hi- 



High: SEU generated in Master; 
or SET in Slave 

CLK CLKB 

Kil 


Low— >High: Master Captures 
its SET 

CLKB 



Summary of Internal DFF SEUs 

P(f S ) DFFSEU = &P(fs) DFF SEU + PP(f s ) PFFSEU 




Percentage of SEUs that 

occur at rising clock edge 

• Master SET gets trapped 
during transition from 
transparent to hold state 
(rising edge of clock). 

• This is considered a state 
change. 

EndPoint SEU 


Percentage of SEUs that occur 
between clock edges 

’ Master or slave is in hold state or 
Slave captures its own SET during 
transition from transparent to hold 
state. 

' This is not considered a definitive 
state change. 

' Must be captured by an EndPoint to 
cause an incorrect change in system 
state. 


StartPoint SEU 

By definition, EndPoint SEUs are already captured into the 
system. How do StartPoints get captured? 
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How 


Does a StartPoint SEU get Captured 
by an EndPoint? , 



■ T clk ■ 


T+1 

Time Slack = T cjk -T djy 


imb 


(AXORB) AND (C XOR D) 

If DFF d flips its state @ time=r: 


0<T<t \ 


Idly 9,5ns 


cik D t d i y or 


r+?dly <T ‘ 


elk 


Probability of capture: 

( T dl/ T clk) = 1~ T dlyf s 
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V 

DFF 


Details of Capturing StartPoint 

— W 

DFFs 


# StartPo int DFFs 

( fiP(f S )DFFSEU(j)0-— T dly(j)f S )P\ogic(j)) 

V 7=1 ^ 

Upset generated \ 

internally to DFF Design Topology r 

between clock and Temporal 

edges Masking 


■ Design Topology 


Design 

Topology and 
Logic Masking 

SEU generation occurs in a StartPoint between rising clock 
edges (fiP(fs) DFFSEU )- 

StartPoint upsets can be logically masked by logic 
between the StartPoint and its EndPoint. 


* Design topology and temporal effects: 

- Increase path delay (# of gates) - decrease probability of capture. 

- Increase frequency - decrease probability of capture. 
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Synchronous System: CL SET 

start Point Capture 

DFFs 

SET 



T width/ End Point 
DFF 

■5ns 




3ns / (A XOR B ) AND (C XOR D) 


Idly ~ 9,5nS 
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V 

DFF 


Details of CL SET Capture 

# Combinator ialCells 

Z ( p , 

k 




v 


i = 1 


P P T fs) 

gen ( i ) prop(i ) log ic width ' 


SET 


/# 

ation \A 


Generation Lo 9 ic Masking 

Propagation: 


% 


T clk 



widfl 


Electrical Masking 


Width of SET 
relative to 


from routes and gate clock P eriod 
cut-off frequencies *clk 


• SET Generation (P ) occurs between clock edges. 

• EndPoint DFF captures the SET at a clock edge. 

- Increase frequency - increase probability of capture. 

- Increase CL - increase probability of capture. 
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Putting it All Together- Analyzed Per 
Particle Linear Energy Transfer (LET) 



EndPoint ( a PU s )oFFSEv(k) 

#StartPoint DFFs 


#EndPoint LOCJIC 

Masking 

P logic(k ) * 


I 

k = 1 


+ EndPoint 


StartPoints 

DFFSEU(J)0- ~ ^dly(j)fsj) * P, logic{j )) 


£ ( PP(fs) 

j= 1 

#CL 

* Fprop ^ * Plogic (i) * ^width^fs) I 


StartPoints and CL need to be captured by an EndPoint... 

hence data path derating factors exist. 

Component Contribution to o SEU across Frequency and Gate Count 



Frequency 

# of Gates in Path 

EndPoint 

Directly Proportional 

N/A 

StartPoint 

Inversely Proportional 

Inversely Proportional 

CL 

Directly Proportional 

Directly Proportional 

u 
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Using the Model to Analyze Heavy 
Ion SEU Cross Sections 
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SEU Characterization of A 
Complex System: Microprocessor 


Test-As-You-Fly versus Using Fest 
Structures and Extrapolation 
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Test Structures versus Final Designs 



• Although error rates and error responses are 
design dependent, useful information can be 
extrapolated from test structures versus the 
final design. 

• Why use test structures versus final designs? 

- By the time the final design is complete, it is usually 
too late to perform radiation testing on it. 

- Can be too difficult to apply input-stimuli to a final 
design. 

- Can be too difficult to monitor DUT responses. 

The following slides give more insight into the benefits of 
using test structures versus full designs during radiation 

testing. 
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Best Practice for Radiation Testing: 
Logic Replication for Statistics 


Best-Practice 
for DUT Test 
Structure 
Development 

How Application- 
Specific Test 
Structures Violate 
Best-Practice 
Considerations 

Test 

• Statistics are poor 

structures 

because usually 

should 

there is not a 

contain a large 

significant amount of 

number of 

replication. 

replicated 


logic in order 

• In addition, trends 

to increase 

for specific elements 

statistics: e.g., 

are not able to be 

shift-registers 

clearly identified / 

with 

established. 

thousands of 


stages. 



SEU testing with hundreds of 
counters versus only one 


Once every 4 clock cycles, 
Output Top Most Value to Tester 
Then Shift Up the next Value 




Simultaneously 
Shift All Counters 
Into Register 
Bank once every 
400 =(4*100) 
Clock Cycles 


R 


Counter 98 | 

Counter 99 


rr- 

Hr 


Shift Up 
Registers 
Every 4 
Clock 
cycles 


Cell(n-I) <= 
Cell(n) 
once 

every 4 clock 
cycles 


Best Practice for Radiation Testing: 
State Space Traversal 


Best-Practice for DUT 
Test Structure 
Development 

How Application-Specific Test 
Structures Violate Best-Practice 
Considerations 

A test structure’s state space 
should be traversable such 
that it can be covered within 
one radiation test run. 

The state space of a complex design 
cannot be traversed within one 
radiation test run. 

Hence, a significant amount of 



circuitry and system states are not 
tested. 

The result is SEU data that are 
uncharacteristic of the design. 


Hath directed test walks through a specified path... 
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Best Practice for Radiation Testing: 
Logic Masking 


Best-Practice for DUT 
Test Structure 
Development 

How Application-Specific Test 
Structures Violate Best-Practice 
Considerations 

Logic masking should be 
minimized or 
controllable. 

Application-specific test 
structures contain a 
significantly higher number of 
masked data paths than test 
structures. 


P i ogic is the probability that an upset will 
be masked from being captured by the 
system. 

P i og ic = 0 : path is 100% masked 
P logic = 1 ■ path has no masking 



Best Practice for Radiation Testing: 
Avoiding Unrealistic SEU Accumulation 



Best Practice characteristics 
of a DUT design 


Avoid unrealistic SEU 
accumulation from accelerated 
testing: 

• Flush through test structures; 
e.g., shift-registers. 


How Application-Specific 
Test Structures Violate Best- 
Practice Considerations 


Application-specific test 

structures take up most of the 
DUT’s area. There are a lot of co- 
dependencies between logic. 


• Small number of gates per sub- 
test structure; e.g., testing 
hundreds of counters. 


Hence, it is difficult to control SEU 
accumulation in ap accelerated 
test environment. 



SRAM Based FPGAs: Scrubbing (correcting) 
configuration SEUs. Extremely important during 
accelerated testing... must keep up with the 
particle flux to avoid accumulation 
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Best Practice for Radiation Testing: 
Increasing Visibility 


Best Practice characteristics 
of a DUT design 

How Application-Specific 
Test Structures Violate Best- 
Practice Considerations 

All (or a significant 
percentage of) potential 
upsets should be observable 
during testing. 

A significant number of 
upsets in a complex design 
are generally not observable 
during radiation testing. 

Test structures can easily be 
designed to enhance 
observable nodes; e.g., 
shift-registers and counters. 

This is true mostly because 
of logic masking, limitations 
in state space traversal, 
limitations in I/O count, or 
time of upset propagation to 
observable node. 
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Benefits of Testing Application 
Specific Designs 



Increase observation error responses specific 
to the application. 


• However, the user must be aware of the 
following: 

- Unrealistic SEU accumulation in an accelerated 
environment. 


- Limited visibility due to masking and fractional state 
space traversal. 

- Poor statistics due to the variance in design circuits. 

* a sEu s will most likely have a large variance if 
circuits are not able to be isolated and 
controlled. 
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CASE Study 

• DUT is a Xilinx V5QV - radiation hardened 
FPGA. 

• Application-specific test structure is an 
embedded microprocessor (Micro-blaze™). 

• Goal is to determine error rates for using an 
embedded Micro-blaze™ processor in the Xilinx 
V5QV with and without cache. 

- Question: Does using cache in embedded memory 
increase the o SEU s such that the Micro-blaze™ will 
not meet project requirements? 


Suggestions on How to Test the 
Application Specific Design 



• Because the goal is to study caching SEU 
effects, test-plan should have a test design that 
contains cache and one that does not. 

• Test basic structures such as shift-registers 
and counters to get an underlying 
understanding of device SEU characteristics. 

• Basic test-structure analysis characterizes: 

- Sequential memory elements (DFFs), 

- Combinatorial logic (CL), and 

- Global routes. 

• Increase visibility of the Micro-blaze™ during 
testing. 


To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and the Military and Aerospace 
Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 19-22, 2014. 


20 


Processor and SRAM Communication 



SRAM: Static random access memory 
BRAM: Block random access memory 

Processors talk to memory 

Micro-blaze™ 

Most processor 
radiation tests 
detect errors by 
erroneous SRAM 
memory writes. 

Visibility is 
significantly 
limited. 




*°t' 


0 ^ 


Cache 


ALU 


SRAM ^ 

Interfac 

e 


V 


M 

* 

H 


LCDT 


using FPGA 


BRAM 


Data Write 

We increase visibility by replacing external SRAM 
with the RE AG low-cost digital Tester (LCDT) 


More on Increasing Visibility with 
Microprocessor Testing (1) 



• As previously stated, the embedded SRAM in 
the tester (BRAM) takes the place of normal 
memory accesses. 

• In addition, each memory access is time 
stamped and logged in alternate bank of BRAM. 
Only the last 512 accesses are kept. 

• After each test run, the time stamped logs are 
output to the user. 


Read 


Timestamp | 


ADDR 

DATA 


Write Address 
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More on Increasing Visibility with 
Microprocessor Testing (2) 


DUT: device under test 


/Halted OUT ' 

> 

Error 


Trace Instruction 


Trace Valid Instruction 


Trace Exception Taken 


Trace Exception Kind 


Trace Register Write 


Trace Register Address 


Trace data cache Request 


Trace data cache Hit 


Trace Data cache Ready 


Trace Data cache Read 


Trace Instruction cache Re 

quest 

\Trace Instruction cachel^ 

: > 


TESTER 


Watchdogs 



Sen< 
watchdog 
errors to host 
computer 


Summary of Case Study Test 
Enhancements 

* Visibility was increased by isolating memory accesses 
as follows: 

- Moving the instruction and data storage to the LCDT for traffic 
observation. 

- Performing tests with and without cache to determine the 
influence cache has on upsets. 

* Differentiating global upsets from the normal data set: 

- Helped to understand which upsets are prominent. 

- Gave insight to how the use of cache will affect o SEU s. 

* Monitoring internal Micro-blaze™ signals 

- o SEU s are not reliant on detecting erroneous memory read and 
writes anymore. Data are too limited and uninformative with 
solely relying on memory reads and writes. 

- Can now determine when a processor crashes and how. 
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Comparing Micro-blazeTM o SEU s and 
Global Clock o SEU s 

SEU Cross Sections: 

Cache vs. No Cache with Global Routes 

1.00E-03 
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Floor Is Open To Discussion 


46 


To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and the Military and Aerospace 
Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 19-22, 2014. 


23 


